Add AI policy to CONTRIBUTING.md#452
Conversation
Commit title authored by GitHub Copilot; the change itself, and this description, were authored by the signing contributor.
|
I don't want to play devil's advocate but what is considered LLM usage? I think it's a spectrum more than a boolean. Is asking a conversational AI service to help you with an instruction LLM usage? If no to any of the above how do you actually prove empirically that this was directly written by an AI and not simply AI-tangent? (AI helped but didn't write the file directly) |
|
generally speaking anything that uses an llm in any way constitutes llm usage |
Just to be clear, if at any point during my PR I ask an LLM about anything related to the PR, e.g
That would automatically mark the entire PR as LLM-assisted? That seems a bit harsh. |
|
I believe the point is about llm generated content being used |
The policy covers any WRITE operations on the repository, including contents, commits, issues, and pull requests. Simply asking a question does not fall into that scope. |
adrienntindall
left a comment
There was a problem hiding this comment.
Will merge on approval from @tgsm
This should be stated in the policy. |
Expanded AI policy to clarify disclosure requirements and exceptions. darn copilot prefilling commit messages and descriptions... where do i turn this off?
37f3ae0
| - Pull request reviews | ||
| - Issues | ||
| - Issue comments | ||
| - Messages to maintaners |
|
A couple thoughts on the implementation side of this policy. It might help to even further clarify the boundary of what counts as LLM usage. The policy already distinguishes between contributing text and “asking questions about the code”, but there are a few potential gray areas that might still come up in practice. For example:
Explicitly stating whether those fall under disclosure could help avoid confusion. The disclosure requirement itself might benefit from being a bit more specific. Right now it says LLM usage must be disclosed in full, but it doesn’t say where or how. For example, should that go in the PR description, commit message, or somewhere else? It is not clear to what extent of granularity needs to be provided as well regarding extent of LLM involvement. A small guideline there could make compliance easier. Since detection may not always be straightforward, it might be worth considering a warning for first-time violations where intent to conceal isn’t clear. That would give contributors a chance to correct behavior while still allowing maintainers to take stronger action in cases of deliberate nondisclosure. Overall the intent of the policy makes sense, these are just a few areas where a little extra clarity might help with implementation. |
|
this is already pretty clear imo |
|
I agree with red's interpretation. Basically, if we imagine copyright could be assigned to AI-generated contents, then the contents subject to that hypothetical copyright are the ones that must be disclosed. So just copy-pasting AI-generated code must be disclosed, modifying AI-generated code would be a derivative work so it also must be disclosed, letting AI modify your own code would be an AI-generated derivative work and it also must be disclosed. When asking the AI how to do something and then doing it yourself, the AI has no authorship over your implementation, so it doesn't need to be disclosed. All the same with text in issues instead of code. I agree with tkolarik in that first-time violations should result in a warn instead of an instant ban, and in that the form of disclosure should be specified. I believe it makes more sense for disclosure about code to be in the commit description as well as the PR. Pull Requests are a GitHub thing separate from the actual Git repository, the commit description is guaranteed to remain for as long as the repository exists. Though I'm personally not too fond of accepting usage of AI, for what it's worth. And from what I've seen, AI-written PR summaries and the like are kind of weird. |
|
I've been mulling over what the AI policy should be, and I think I want to draw a stricter boundary than what's already in here. Mainly, I think that the use of a LLM to generate c from assembly or to document existing code should be outright banned. Other use cases, like using it to automate tedious tasks like function name changes for example, are permissible. These AI PRs only attempt to take low hanging fruit, which can be done by a skilled maintainer in a matter of minutes. The use of AI to decompile these files therefore doesn't significantly speed up the actual decompilation efforts since we would still need to go back and document everything anyways. I'm also concerned after viewing one of these accounts that made an AI contribution that these "contributors" are doing reputation farming via opening easy and inoffensive PRs that don't have any glaring problems with them. Not to mention that one of them outright stole the code of a separate, already open PR. Then in terms of documentation, it should be obvious why an AI shouldn't document code that is meant to be used by humans, especially since from what I've seen it only recognizes how to document generic functions (alloc, accessors, setters) and not anything more intricate than that. |
| --> | ||
| - [ ] This work adheres to the [AI policy](CONTRIBUTING.md#ai-policy). | ||
|
|
||
| ## **Discord contact info** |
There was a problem hiding this comment.
I think that discord verification is important but I'm hesitant to require it as part of a PR for everyone since it'll be public information, and discord stalkers are real (speaking from experience). It might just be worth having them join and verify by sending a message in #pokeheartgold
We can add the following the the above checklist instead
[ ] The author has joined the pret discord community and sent a message in #pokeheartgold to verify that they are human
Reasonably though any human contributor would do this anyways, so it might be redundant.
lhearachel
left a comment
There was a problem hiding this comment.
I have a couple thoughts here, mostly to tighten the seals. I'd like to also have this policy in pokeplatinum.
|
|
||
| The following use cases are deemed acceptable and stand as exceptions to the above statement, provided that they are disclosed in full. While this document is not legally binding, we do expect you to represent yourself truthfully. Undisclosed use of AI may result in a ban. | ||
| - Automating the boring stuff. So long as the task is clearly defined by a human, is boiled down to pure execution with no further creative decision-making, and is sufficiently tedious to perform by hand, you may delegate it to an AI. However, we do strongly encourage you to do as much as you can by hand or write a script to automate the task, rather than invoking an AI to do it for you. | ||
| - Asking general knowledge questions about C code, the ARM processor, Pokémon, etc. |
There was a problem hiding this comment.
suggestion: Provide a means by which a potential contributor can ask questions that doesn't involve an LLM. This could be as simple as nudging them to either join the pret Discord server or open an issue.
|
|
||
| We unequivocally prohibit the use of artifical intelligence (AI) large language models (LLMs) to generate contributions to this project, meaningful or otherwise. Any pull request found to have used AI for these tasks will be closed, and the contributor will be banned from interacting with the repository. This is a zero-tolerance policy, and we do not provide any avenue for appeal. | ||
|
|
||
| The following use cases are deemed acceptable and stand as exceptions to the above statement, provided that they are disclosed in full. While this document is not legally binding, we do expect you to represent yourself truthfully. Undisclosed use of AI may result in a ban. |
There was a problem hiding this comment.
question: Do we want to explicitly outline an exception for translating text from issues / comments / pull requests? Does that count as a contribution?
Would it be better to outline what a "contribution" is? In my mind, a contribution to these repositories is any of the following:
- Decompilation of assembly instructions to C
- Documentation of C code for readability or semantic-meaning
- Text-documentation (e.g., writing a "how the game works" Markdown document)
- Creating issues
- Unpacking archives into editable assets
- Creating tools for translating in-repo assets to in-game assets
There was a problem hiding this comment.
contribution is pretty defined imo
anything that adds to the repo is a contribution
imo there are many better tools for translation than llms but I dont know how we'd ban that?
There was a problem hiding this comment.
but if it's translation that is committing to the repo and uses llm, it shouldn't be added to the repo
There was a problem hiding this comment.
anything that adds to the repo is a contribution
Is creating an issue to report some inconsistency or misdocumentation a contribution to the repository? It contributed to the GitHub plumbing, but not inherently to the git tree, which I think is fuzzily-defined as written.
There was a problem hiding this comment.
I'd say so
creating AI slop issues is not gonna be very productive and get a good response
There was a problem hiding this comment.
Right. I agree. My argument is that "contribution" is a bit ambiguous by itself, and it's low-cost to outline examples (including some language like "not limited to") of what we deem to be contributions.
Commit and PR title authored by GitHub Copilot; the change itself, and this description, were authored by the signing contributor.
PR checklist
make compare_heartgold && make compare_soulsilver).git clang-format).Discord contact info